99 research outputs found

    On Multilingual Training of Neural Dependency Parsers

    Full text link
    We show that a recently proposed neural dependency parser can be improved by joint training on multiple languages from the same family. The parser is implemented as a deep neural network whose only input is orthographic representations of words. In order to successfully parse, the network has to discover how linguistically relevant concepts can be inferred from word spellings. We analyze the representations of characters and words that are learned by the network to establish which properties of languages were accounted for. In particular we show that the parser has approximately learned to associate Latin characters with their Cyrillic counterparts and that it can group Polish and Russian words that have a similar grammatical function. Finally, we evaluate the parser on selected languages from the Universal Dependencies dataset and show that it is competitive with other recently proposed state-of-the art methods, while having a simple structure.Comment: preprint accepted into the TSD201

    A derivational model of discontinuous parsing

    Get PDF
    The notion of latent-variable probabilistic context-free derivation of syntactic structures is enhanced to allow heads and unrestricted discontinuities. The chosen formalization covers both constituent parsing and dependency parsing. The derivational model is accompanied by an equivalent probabilistic automaton model. By the new framework, one obtains a probability distribution over the space of all discontinuous parses. This lends itself to intrinsic evaluation in terms of perplexity, as shown in experiments.Postprin

    Taking SPARQL 1.1 extensions into account in the SWIP system

    Get PDF
    International audienceThe SWIP system aims at hiding the complexity of expressing a query in a graph query language such as SPARQL. We propose a mechanism by which a query expressed in natural language is translated into a SPARQL query. Our system analyses the sentence in order to exhibit concepts, instances and relations. Then it generates a query in an internal format called the pivot language. Finally, it selects pre-written query patterns and instantiates them with regard to the keywords of the initial query. These queries are presented by means of explicative natural language sentences among which the user can select the query he/she is actually interested in. We are currently focusing on new kinds of queries which are handled by the new version of our system, which is now based on the 1.1 version of SPARQL

    Integrating isotopes and documentary evidence : dietary patterns in a late medieval and early modern mining community, Sweden

    Get PDF
    We would like to thank the Archaeological Research Laboratory, Stockholm University, Sweden and the Tandem Laboratory (Ångström Laboratory), Uppsala University, Sweden, for undertaking the analyses of stable nitrogen and carbon isotopes in both human and animal collagen samples. Also, thanks to Elin Ahlin Sundman for providing the ÎŽ13C and ÎŽ15N values for animal references from VĂ€sterĂ„s. This research (BĂ€ckström’s PhD employment at Lund University, Sweden) was supported by the Berit Wallenberg Foundation (BWS 2010.0176) and Jakob and Johan Söderberg’s foundation. The ‘Sala project’ (excavations and analyses) has been funded by Riksens Clenodium, Jernkontoret, Birgit and Gad Rausing’s Foundation, SAU’s Research Foundation, the Royal Physiographic Society of Lund, Berit Wallenbergs Foundation, Åke Wibergs Foundation, Lars Hiertas Memory, Helge Ax:son Johnson’s Foundation and The Royal Swedish Academy of Sciences.Peer reviewedPublisher PD

    Splitting Arabic Texts into Elementary Discourse Units

    Get PDF
    International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system
    • 

    corecore